Analyzing Large Collections of Electronic Text Using OLAP

نویسندگان

  • Steven Keith
  • Owen Kaser
  • Daniel Lemire
چکیده

Computer-assisted reading and analysis of text has applications in the humanities and social sciences. Ever-larger electronic text archives have the advantage of allowing a more complete analysis but the disadvantage of forcing longer waits for results. On-Line Analytical Processing (OLAP) allows quick analysis of multidimensional data. By storing text-analysis information in an OLAP system, queries may be solved in seconds instead of minutes or hours. This analysis is user-driven, allowing users the freedom to pursue their own directions of research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Modeling and Analysis of Multidimensional Geographic Databases

A Data Warehouse is a centralized repository of data acquired from external data sources and organized following a multidimensional model (Inmon, 1996) in order to be analyzed by On-Line Analytical Processing (OLAP) applications. OLAP tools provide the ability to interactively explore multidimensional data presenting detailed and aggregated data. The results of analyses are the basis of strateg...

متن کامل

XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses

Recently, a large number of XML documents are available on the Internet. This trend motivated many researchers to analyze them multi-dimensionally in the same way as relational data. In this paper, we propose a new framework for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where every fact data as well as dimension data are stored as XML...

متن کامل

Text Mining with the WEBSOM

The emerging eld of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps a...

متن کامل

OLAP Cube Visualization of Environmental Data Catalogs

Systems like SciScope and Data Access System for Hydrology (DASH) rely on data catalogs to facilitate data discovery. These catalogs describe several nation-wide data repositories that are important for scientists including US Geological Survey’s National Water Information System (NWIS), Environmental Protection Agency’s STOrage and RETrieval System (EPA STORET) and National Climatic Data Cente...

متن کامل

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0605127  شماره 

صفحات  -

تاریخ انتشار 2005